Integrating multi-platform assembly to recover MAGs from hot spring biofilms: insights into microbial diversity, biofilm formation, and carbohydrate degradation

Background Hot spring biofilms provide a window into the survival strategies of microbial communities in extreme environments and offer potential for biotechnological applications. This study focused on green and brown biofilms thriving on submerged plant litter within the Sungai Klah hot spring in Malaysia, characterised by temperatures of 58–74 °C. Using Illumina shotgun metagenomics and Nanopore ligation sequencing, we investigated the microbial diversity and functional potential of metagenome-assembled genomes (MAGs) with specific focus on biofilm formation, heat stress response, and carbohydrate catabolism. Results Leveraging the power of both Illumina short-reads and Nanopore long-reads, we employed an Illumina-Nanopore hybrid assembly approach to construct MAGs with enhanced quality. The dereplication process, facilitated by the dRep tool, validated the efficiency of the hybrid assembly, yielding MAGs that reflected the intricate microbial diversity of these extreme ecosystems. The comprehensive analysis of these MAGs uncovered intriguing insights into the survival strategies of thermophilic taxa in the hot spring biofilms. Moreover, we examined the plant litter degradation potential within the biofilms, shedding light on the participation of diverse microbial taxa in the breakdown of starch, cellulose, and hemicellulose. We highlight that Chloroflexota and Armatimonadota MAGs exhibited a wide array of glycosyl hydrolases targeting various carbohydrate substrates, underscoring their metabolic versatility in utilisation of carbohydrates at elevated temperatures. Conclusions This study advances understanding of microbial ecology on plant litter under elevated temperature by revealing the functional adaptation of MAGs from hot spring biofilms. In addition, our findings highlight potential for biotechnology application through identification of thermophilic lignocellulose-degrading enzymes. By demonstrating the efficiency of hybrid assembly utilising Illumina-Nanopore reads, we highlight the value of combining multiple sequencing methods for a more thorough exploration of complex microbial communities. Supplementary Information The online version contains supplementary material available at 10.1186/s40793-024-00572-7.


Introduction
Many hot springs in Southeast Asia have been transformed into parks, while only a few still retain their natural surroundings with introduced lignocellulosic plant litters.These sites therefore provide an excellent opportunity to interrogate microbial adaptation to thermophilic utilisation of plant carbohydrate polymers.Research on thermophiles holds potential in biotechnology, especially in industries like bioremediation, biomass conversion, and pulping [1].Carbohydrate-active enzymes (CAZymes) with inherent thermostability hold great promise for utilising these environments [2].
The isolation of novel thermophiles, a crucial step in unlocking the potential of biomolecules resources, presents significant challenges.Amid these challenges, the metagenome-assembled genome (MAG) approach has emerged as a promising strategy.Currently, most MAG research in the context of hot springs has employed Illumina short-read sequencing, as reflected in various studies [3][4][5][6][7][8][9].To meet the Minimum Information about a Metagenome-Assembled Genome (MIMAG) guidelines, metagenomic bins must exhibit over 80% completeness and less than 5% contamination [10].Unfortunately, the assembly of short reads often results in fragmented MAGs.The utilisation of long-read sequencing has emerged as an alternative, with platforms like PacBio and Nanopore gaining traction.Kato et al. [11] demonstrated the feasibility of PacBio HiFi long reads for hot spring samples, successfully generating an output of 27.96 Gbp with an N50 of 10,544 bp.They also generated 14 complete and circularised MAGs.Usually, PacBio necessitates a marginally greater quantity and better quality of DNA in comparison to Nanopore.There is currently a scarcity of studies employing long-read Nanopore technology alone, or in combination with Illumina sequencing, for hot spring metagenomic sequencing and MAG assembly.
Several metagenomic studies have reported microbial metabolic adaptation in hot springs based on sequencing data.For example, the distribution and putative role of complete ammonia oxidation (commamox) bacteria in Qinghai-Tibetan Plateau hot springs was revealed [12].Another study unravelled degradation pathways for lignin-derived aromatic compounds in thermal swamps [13].In various locations, including tropical hot springs and Yellowstone National Park, amplicon sequencing and shotgun metagenomics revealed the significance of carbohydrate-utilising microorganisms [14,15].A study in an Indian hot spring employed functional gene prediction tools (Tax4Fun and Phylogenetic Investigation of Communities by Reconstruction of Unobserved States, PICRUSt) on amplicon data to estimate widespread carbohydrate utilisation across a thermal gradient (43-65 °C) [16].MAGs and metatranscriptomic data from this site were subsequently analysed [7,17].These diverse studies collectively emphasize the universal importance of microbial metabolic adaptation and carbohydrate utilisation in various hot spring environments, providing valuable insights into their ecological roles and potential applications.
In this study, our focus was on the Malaysian Sungai Klah (SKY) geothermal hot spring park, situated in a tropical forest abundant with plant litter and featuring two major types of biofilms [14].Notably, previously published hot spring metagenomic or MAGs lacked the presence of plant litter in the water.The understanding of the microbiome and its role in adaptation within such ecosystems remains limited.A hybrid assembly strategy combining Illumina and Nanopore reads for MAGs construction was employed in this study.This effort not only provided a comprehensive view of microbial diversity and functional potential but also opened avenues for biotechnological applications, particularly in the realm of carbohydrate degradation within hot spring ecosystems.

Sampling
The Sungai Klah hot spring is located in Peninsula Malaysia in a tropical rainforest climate (3°59′50.50′′Nand 101°23′35.51′′E).The park has a main shallow main stream, featuring temperatures of 60-100 °C and a pH range of 7-9, possesses minimal plant litter and we have previously described microbial diversity in water and sediment of these microhabitats [18].Alongside the main stream at the SKY site, one encounters submerged leaves and woody plant litter in various stages of decomposition.These include contributions from a diverse array of plant species, such as Vitex, Ficus, Stenochlaena, and Adenanthera.The spring head (71-74 °C, pH 8.5) supported brown biofilms whilst green biofilms (58-64 °C, pH 8.5) developed on the surface of the plant litter bed in geothermal waters (Fig. 1) [14,19].Sampling was conducted in November 2019 and August 2020 as previously described.Biofilms were recovered and samples preserved at − 20 °C prior to processing.In brief, green biofilms were randomly collected within a half-foot radius into sterile tubes.Approximately 11 feet away from the green biofilm site, we obtained brown biofilm samples in multiple replications.Approximately 500 mg of wet biofilms from each collected sample underwent cell lysis in a TissueLyser II (Qiagen, Hilden, Germany), and genomes were purified using the FastDNA Spin Kit for Soil (MP Biomedicals, Solon, USA).High quality extracted genomes were pooled before sequencing.

Illumina shotgun metagenome sequencing and MAGs assembly
Our previously sequenced Illumina libraries were used to assemble MAGs as follows [14].Each of the two green biofilm samples and two brown biofilm samples yielded approximately 20Gb (66.5 million paired-end reads) of data from Illumina NovaSeq 6000 sequencer.We used co-assembly approach for biofilms fastq files.

Nanopore sequencing and MAGs assembly
For Nanopore long-read sequencing, both brown and green biofilm genomes underwent one respective flow cell, utilising the Nanopore-suggested protocol SQK-NBD112-24 (Q20 + chemistry on R10.4 flow cell), bypassing genome shearing to obtain long reads.We followed the Nanopore Ligation Sequencing Kit SQK-LSK112 protocol, which included AMPure treatment, adapter ligation, purification, and quantification using Qubit, before loading the samples onto a Min-ION R10.4 flow cell housed in Mk1C that operated with MinKNOW v.22.05.8.Reads were basecalled using MinKNOW coupled with Guppy v6.1.5,and with basecalling configuration set to FLO-MIN112-Super-Accurate. Barcodes were removed, trimmed, filtered, and reads that passed Q20 (< 5% error) were subjected to de novo assembly by using Flye v2.9 (parameters: -nanohq -meta).MAGs were constructed using metaWRAP v1.3, with the binning tools MetaBAT2 v2.12.1, CON-COCT v1.0.0, and MaxBin2 v2.2.6.Other modules were carried out as described for the Illumina workflow described above, except that the Reassembled_bin module was not carried out due to the lack of pairedend information in Nanopore reads.Unless specified, the bioinformatics software was run in default parameters.

Phylogenomic analysis of MAGs
GToTree v1.8.1 was used to analyse the phylogenomics of all MAGs (parameters: -H Bacteria_and_Archaea -D -T IQ-TREE).A total of 25 marker genes of bacterial and archaeal was used in the analyses.The output tree files (.tre) were visualised by iTOL v6.8 with proper colouring and annotations of the MAGs based on the taxonomy as identified by GTDB-Tk.

Functional annotation of MAGs
Prodigal v2.6.3 was utilised to identify open reading frames (ORFs).Protein sequences related to heat shock proteins were matched against the heat shock protein information resource (HSPIR) database [29].Proteins involved in carbohydrate utilisation were identified via the Carbohydrate-Active EnZymes (CAZy) database using run_dbCAN v3.0 [30], and the selected sequences needed to demonstrate positivity in at least one of three tests using HMMER, Hotpep, or Diamond.Sequences of ABC-type sugar transporters, major facilitator superfamily (MFS), sodium solute symporters, and the phosphotransferase system were retrieved from Inter-Pro or UniProtKB.If required, sequence verification was conducted using Diamond v2.0.14, BlastP searched against the NCBI non-redundant (nr), SwissProt, Inter-Pro, and Protein Data Bank (PDB) databases.

Site description
SKY hot spring is a unique high-temperature spring known to be filled with plant litters.In a previous publication, we investigated prokaryotic and eukaryotic diversity in two biofilms using 16S and 18S rRNA amplicon sequencing.For a better understanding of the sampling site and the bioinformatics protocol used in this study, please refer to Fig. 1 and our earlier report [14].The SKY hot spring, with a relatively small water body, experiences limited water movement and exhibits a temperature range of 58-64 °C, while the spring head maintains an average temperature of 71-74 °C, and the average pH value remains around 8.5.The water chemistry analysis revealed the following concentrations: total organic carbon (TOC) at 0.8 mg/L, total nitrogen at 1.8 mg/L, sulfur at 2.7 mg/L, sulphate at 5 mg/L, and bicarbonate at 27 mg/L.Other hot springs in Malaysia that lack plant litter often have TOC values ranging from 0 to 0.4 mg/L.
Based on amplicon data, the green biofilm exhibited dominance with approximately 50-60% Cyanobacteria.Together with Bacteroidota and Chloroflexota, these three phyla constituted nearly 90% of the total detected amplicon sequence variants (ASVs) [14].In the brown biofilm, Chloroflexota dominated half of the ASVs, while major ASVs from Bacteroidota, Thermotogota, and Armatimonadota collectively constituted about 20-40% of the community composition.A diverse presence of other bacterial phyla, each exceeding 1% abundance, was also noted.Crenarachaeota was the sole major archaeal phylum observed in both samples.

Comparison of illumina-based MAGs, nanopore-based MAGs, and illumina + nanopore hybrid MAGs
We performed shotgun sequencing of the two biofilms collected in Nov 2019 and Aug 2020, with each biofilm subjected to two runs on Illumina NovaSeq, generating approximately 20 Gbp output reads per run from short reads.Assembly was performed using MEGAHIT.Employing the default metaWRAP pipeline setting, each shotgun sequencing run yielded an average of 70 MAGs per dataset for each sampling (data not shown).Due to the similarity of microbiota types between the 2019 and 2020 samples and the high redundancy between MAGs, we decided to employ a co-assembly strategy.This involved pooling fastq files of the same biofilm type (i.e., 20 Gbp + 20 Gbp raw data from green biofilm or brown biofilm) before running the MEGAHIT and metaWRAP pipeline simulations.This strategy allowed us to generate a higher number of MAGs, resulting in 132 medium to high quality MAGs for the green biofilm and 131 medium to high quality MAGs for the brown biofilm (Table 1).
Utilising the Nanopore R10.4 flow cells, we sequenced genomes extracted from green and brown biofilms collected in August 2020.Although the output reads were lower compared to the R9 series, the R10.4 flow cell with K12 chemistry demonstrated improved accuracy [31].We obtained approximately 5.2 Gbp of high-accurate long reads for the green biofilm and 3.6 Gbp for the brown; these long reads were subsequently assembled using Flye.Despite the lower output, we successfully obtained 29 MAGs for the green sample and 16 for the brown biofilm (Table 1).It is worth noting that the average (and median) completeness of MAGs generated by Nanopore is lower than that of Illumina-MAGs.Additionally, the estimated contamination in Nanopore-generated MAGs is relatively high on average (and median) compare to Illumina-MAGs.
To explore the potential benefits of combining Illumina and Nanopore sequencing data, we conducted a hybrid assembly approach for each green and brown biofilm, utilising Illumina reads (20 + 20 Gbp) and Nanopore reads, respectively.The assembly process was performed using MEGAHIT or HybridSPAdes.The summary of key statistics for the obtained MAGs is presented in Table 1.
Comparing the performance of the hybrid assemblers, MEGAHIT and HybridSPAdes, both generated a higher number of total MAGs compared to using Illumina reads alone.In terms of MAG completeness, HybridSPAdes exhibited better performance than MEGAHIT for our dataset.On average, the hybrid assemblers demonstrated an improvement in N50 when compared to Illumina-MAGs.Additionally, hybrid-MAGs showcased reduced total numbers of contigs, indicating a less fragmented genome, although this improvement was not consistent throughout the entire dataset.

Taxonomy of dereplicated MAGs
To compare and evaluate the MAGs generated from Illumina, Nanopore, and the hybrid assembly of both, we employed the dRep program.Our goal was to select the highest quality MAGs specific to each biofilm sample, aiding in the identification of reliable and representative genomes for our downstream analysis.The total numbers of selected MAGs were summarised in Table 2.
Dereplicated MAGs were obtained separately for the green and brown biofilm data sets.However, dereplication between the green and brown biofilms was not carried out.Many Nanopore-derived MAGs were not chosen by dRep probably because the quality of Nanopore-derived MAGs may not be on par with Illuminaderived MAGs or hybrid MAGs generated from both Illumina and Nanopore.Due to the likelihood that the depth is not sufficient, the Nanopore-derived MAGs for The majority of MAGs (total 108) in the green biofilm sample were found to be associated with the Bacteria domain and fell within 17 phyla.Phyla with highest numbers of MAGs included Acidobacteriota with eight MAGs, Bacteroidota with 29 MAGs, Chloroflexota with 20 MAGs, Proteobacteria with 13 MAGs, Planctomycetota with seven MAGs, and Verrucomicrobiota with five MAGs.Additionally, several minority phyla were present in the sample, including Actinobacteriota, Bdellovibrionota, Bipolaricaulota, Deinococcota, Myxococcota, Patescibacteria, Spirochaetota, and others (Fig. 2, Additional file 1: Table S1).We also constructed three MAGs within the Archaea domain, with all three falling under the phylum Thermoproteota.
The MAGs obtained from the green biofilm sample contained five Cyanobacterial taxa, including species within the genera Gloeomargarita and Geminocystis that are novel records for hot springs.Both MAGs are classified at the identical class level (Cyanobacteriia) and order level (Cyanobacteriales), indicating a potential ecological connection.Gloeomargarita is known for its ability to form multicellular filaments, while Geminocystis spp.are solitary, spherical, or slightly oval and non-filamentous [32].The other three MAGs could only be classified at the family level: Pseudanabaenaceae, Oscillatoriaceae, and Neosynechococcaceae.
A significant number of MAGs in both biofilm datasets remained poorly phylogenetically classified due to limited available taxonomy information.For example, in the case of the green biofilm sample, out of the total 111 dereplicated MAGs, nearly 50% of them cannot be accurately assigned to a specific family based on the nomenclature of cultured type strain representatives.These MAGs are characterised by a lack of taxonomic resolution, with few of them only able to be classified at the order level.Furthermore, few MAGs from the green biofilm dataset could not even be assigned to the class level (HRBIN16 and UBA11346), with only their phylum information known (Armatimonadota and Planctomycetota).
The brown biofilm, located near the hot spring head with significantly higher temperatures compared to the green biofilm, was subjected to MAG taxonomic analysis (Fig. 3, Additional file 1: Table S1).The detected archaea MAGs in brown biofilm were higher in numbers and more diverse than in green biofilm.We detected Archaea from three distinct phyla: Aenigmatarchaeota, Halobacteriota, and Thermoproteota, with MAG counts of 2, 4, and 30, respectively.The few Thermoproteota MAGs were extremely likely to be species from the genera Candidatus Caldarchaeum, Candidatus Korarchaeum, Candidatus Nitrosocaldus, Candidatus Caldarchaeum, and Ignisphaera, whilst many other MAGs could only be placed at a higher level within the Thermoproteota.In this current work, 16 MAGs related to Chloroflexota were constructed.Figure 4 summarises the average nucleotide index (ANI) between the two biofilms that were identified as Chloroflexota MAGs.

Energy metabolisms in biofilms
Since the taxonomy of the community in the biofilm is complex, we expected to encounter taxa that syntrophically support each other in obtaining nutrients.Green biofilm was dominated primarily by aerobic and anaerobic photoautotrophic Cyanobacteria that derive their energy from light and CO 2 .In the green biofilm, we identified multiple Chloroflexota MAGs, one of which is likely affiliated with the Chloroflexus genus.However, this MAG does not match any known species or subspecies of Chloroflexus aggregans [33] due to low ANI similarity.Another dominant MAG in the green biofilm was identified as Bacteroidota.We constructed 29 Bacteroidota MAGs, with only one confidently classified as Ignavibacterium.This genus comprises non-photrophic heterotrophs, suggesting that many other MAGs related to Bacteroidota survive by using organic compounds as energy and carbon sources through chemoheterotrophy.
Another main phylum in green biofilm was Acidobacteriota.Most members of this phylum are probably organotrophs and use chemoautotrophy for energy production.
One of the MAGs in the green biofilm has a high ANI with genus Chloracidobacterium.It is worth noting that Chloracidobacterium thermophilum, a type strain, is the only chlorophyll (Chl)-dependent phototrophic genus in the Acidobacteria phylum [34].In short, in the green biofilm, the main energy sources are derived from light and CO 2 , with aerobic and anaerobic photoautotrophic Cyanobacteria dominating.The types of phyla for MAGs constructed for the brown biofilm dataset are summarised in Fig. 3. Chloroflexota is known for its ability to perform anoxygenic phototrophy and aerobic respiration.Aquificaceae, Kapabacteriales, Kryptoniaceae, and Armatimonadota inhabit the brown biofilm and are likely to rely on chemoorganotrophic, chemoheterotrophic, or oligotrophic metabolism for their thriving.Furthermore, certain Thermoproteota MAGs belong to heterotrophic species within the genera Candidatus Caldarchaeum and Candidatus Korarchaeum.Additionally, Candidatus Nitrosocaldus is known for its autotrophic and chemolithoautotrophic characteristics, Candidatus Caldarchaeum is probably chemoorganotrophic or chemoheterotrophic, and Ignisphaera typically exhibits chemoorganotrophic traits [35].In addition, Pseudothermotoga, another bacterium found in brown biofilm, is thermophilic, anaerobic, fermentative, and hydrogen-producing.Collectively for the brown biofilm, energy sources vary across different phyla.

Macromolecules involved in biofilm formation
Based on genome annotation, Cyanobacterial MAGs detected in the green biofilms contained genes associated with various aspects of type IV pilus biogenesis, twitching motility, and bacterial adhesion in Cyanobacteria, including leader peptidase (prepilin peptidase), pilus biogenesis proteins (PilF, PilM, PilQ), fimbrial assembly ATPase (PilB), type IV pilin (PilA), fimbrial assembly protein (PilC), and twitching motility protein (PilT/ PilU Fig. 3 Maximum likelihood phylogenetic tree of MAGs (brown biofilms, BB) based on the alignment of 25 marker genes in GToTree and visualised by iTOL family).Pili or fimbriae are short and thin non-flagellar appendages that facilitate Cyanobacterial adherence to surfaces [36].Additionally, these MAGs showed a noticeable presence of proteins associated with the LuxR family, a two-component transcriptional response regulator.This suggests a potential role in signal transduction and gene regulation, particularly in processes such as quorum sensing.
We identified proteins associated with exopolysaccharide biosynthesis in MAGs related to Cyanobacteria, including the polyprenyl glycosylphosphotransferase (Wzx or Flippase).This exopolysaccharide biosynthesis enzyme plays a crucial role in translocating the repeating units of exopolysaccharides across the inner membrane of bacteria.Interestingly, the MAG related to Neosynechococcaceae exhibited an additional type of polysaccharide biosynthesis specific to hormogonium polysaccharide.
In addition to exopolysaccharide biosynthesis, our analysis revealed the presence of protein sequences related to polysaccharide deacetylases, lipopolysaccharide export systems, polysaccharide export proteins, capsular polysaccharide biosynthesis, and nucleoside-diphosphatesugar pyrophosphorylase [37].
Besides Cyanobacteria, it is likely that members of the phylum Chloroflexota, represented by MAGs, also contribute to the formation of green biofilm matrices.Within the total Chloroflexota MAGs identified in the green biofilm, six of them showed high similarity in ANI values to the genera Chloroflexus, Caldilinea, or Candidatus Roseilinea.These genera are known to form filamentous biofilms [33].Genomic analysis of several Chloroflexota MAGs revealed the presence of related genes or proteins associated with pilus assembly, such as general pilus assembly proteins, Flp pilus assembly complex ATPase The physical appearance of the brown biofilm suggested greater biocomplexity.It consists of a combination of slimy, intertwined with thin, elastic, jelly-like reddish-brown biofilm.The reddish-brown hue is likely caused by Roseiflexus-like MAGs that belongs to phylum Chloroflexota.These MAGs likely contribute to EPS formation through the action of proteins such as polysaccharide biosynthesis protein, polysaccharide biosynthesis C-terminal domain-containing protein, and polysaccharide deacetylase family protein.While the taxonomy of Chloroflexota in the brown biofilm differs from the green biofilm, the overall principle of biofilm formation is expected to be similar.
Occasionally, light grey or whitish fibrous biofilm is also present alongside the brown biofilm, possible linked to four Aquificaceae-like MAGs.This fibrous biofilm resembles the one documented in thermal streams in Russia [38].In the constructed MAGs, we identified Type IV twitching motility protein PilT, PilT/PilU family pilus ATPase, pilus assembly protein PilM (closely related to Hydrogenobacter type), pilus assembly protein, and prepilin peptidase.Additionally, the phylum Thermotogota stands out as one of the prominent taxonomic groups identified in the brown biofilm sample, and Thermotoga maritima is known for its capacity to produce exopolysaccharides [39].

Heat stress adaptations
We have visited SKY hot spring multiple times, and we observed fluctuations in water temperature with variations of 3-5 °C.Based on these observations, we anticipate that molecular chaperones play a crucial role in maintaining protein homeostasis and mitigating the impact of protein denaturation and proteotoxicity caused by sudden temperature changes.
To protect their functional proteins, thermophiles employ various strategies including heat shock proteins (HSPs), chaperones, chaperonins, and α-and β-subunit prefoldins [40].In our study, we conducted protein sequence search for HSP20, HSP40, HSP60, HSP90, and HSP100 in all the MAGs and visualised them in a heat map (Fig. 5).In general, more HSP40 sequences were detected, while HSP90 were found to be least abundant.HSP40, also denoted as DNAJ or DnaJ/Hsp40 homologs, is primarily involved in assisting protein folding, preventing protein aggregation, and maintaining the integrity of protein quality control.HSP60, known as chaperonins (chaperonin GroEL), is crucial for proper protein folding and preventing aggregation.HSP90 proteins (HtpG) are indispensable for protein maturation and stabilisation.HSP100 proteins, including ClpB, belong to an ATPdependent chaperone family and aid in the disaggregation and reactivation of denatured or aggregated proteins caused by stress conditions.

Starch and lignocellulosic degradation and sugar transporter
An overview of the putative CAZymes in all the dereplicated MAGs is shown in Fig. 6.The proteins were grouped according to the main catalytic reactions, i.e., amylolytic enzymes, cellulolytic enzymes, and hemicellulosic enzymes.Since certain glycosyl hydrolases (GH) groups (i.e., GH1, GH2, GH3, GH5, etc.) consist of a mixture of cellulolytic and hemicellulosic enzymes, we have therefore separated them in the heatmap.Extracellularly expressed hydrolases will cleave the carbohydrate polymers, and eventually, a broad range of sugar transporters (Fig. 7) will import these monomeric, dimeric, or short polymerisation degree sugar chains for energy consumption and other biochemical pathways.
Chloroflexota MAGs may express a wide range of hydrolases, each of which performs a specialised function in the breakdown of complex polysaccharides within or adjacent to the biofilm matrix.These putative enzymes are involved in the hydrolysis of starch, pullulan, or glycogen, including α-amylase, trehalose synthase, glycosyl hydrolases and pullulanase, among others.Another set of enzymes target xylan and cellulose, including 1,4-β-xylosidase, endo-1,4-β-xylanase, and β-glucosidase.Arabinogalactan β-L-arabinofuranosidase, endo-1,4-βgalactanase and L-arabinose isomerase participate in the hydrolysis of arabinogalactans, while β-mannosidase and α-mannosidase act on mannose-containing compounds.Collectively, these diverse hydrolases play a crucial role in breaking down complex carbohydrates within or adjacent to the biofilm matrix, highlighting the remarkable metabolic versatility of Chloroflexota MAGs in high temperature plant litter decomposition.
Within our dataset, we discovered representatives from the class Armatimonadota present in the MAGs related to HRBIN16 and HRBIN17, along with additional classes like Abditibacteria and Fimbriimonadia.Notably, In the context of a high-temperature hot spring, it becomes evident that the microbiota engages in a symbiotic decomposition of plant litter.Take, for instance, a MAG affiliated with Bipolaricaulaceae, which features five distinct intracellular β-glucosidases (devoid of signal peptides) but lacks other hydrolases such as cellulose.While other members of the biofilm kickstart the degradation of cellulosic materials, this bacterium Bipolaricaulaceae strategically employs its β-glucosidases to break down short oligosaccharides.This enzymatic action converts the oligosaccharides into glucose, serving as an energy source for the bacterium.
Expanding our investigation, we explored the vast landscape of CAZymes, finding a plethora of novel enzymes in diverse taxa.Table 3 shows some examples of enzymes, namely endoglucanases, xylanases, β-xylosidases, and β-glucosidases, for their sequence similarities with the closest counterparts.

Sequencing strategies and bioinformatic integration
In early 2022, Oxford Nanopore launched an early access ligation sequencing kit (Q20+, K12 chemistry) with over 99% raw read accuracy.Nanopore long reads have two advantages: a larger N50 than Illumina-MAGs, resulting in less fragmented contigs, and longer assembled DNA.However, for our current experiment, the data clearly indicate that one single Nanopore flow cell on MinION is not sufficient to generate a high number of mediumto high-quality MAGs.Despite our intention to rerun the frozen extracted genome on another flow cell, unfortunately, Oxford Nanopore has discontinued the R10.4 temporary version.However, it is worth noting that they now offer the R10.4.1 version.We decided not to purchase and try the latter version due to inconsistencies in the experimental setup.While we have not performed it, we would likely need at least two flow cells on MinION for each sample if aiming for a higher total number of constructed MAGs based on Nanopore alone.
In the past, hybrid assembly of Illumina short-reads and long-reads from other platforms is one of the common practices for various applications such as pure genomes, mock communities, and real environmental microbiomes [41,42].Recent advances in Nanopore (R10.4.1 and onward) and PacBio HiFi reads have challenged the necessity of hybrid assembly.Some reports have demonstrated that long-reads alone can be sufficient, provided that the sequencing depth is adequate [11,31].At the time of writing, Kato et al. [11] is the only team that demonstrated the use of PacBio HiFi long reads for a Japanese hot spring sample.Although there is a scarcity of publications that specifically employ long-read techniques on hot spring biofilms, an ongoing inquiry revolves around whether hybrid assembly can adeptly leverage the advantages of Illumina deep sequencing data and Nanopore long reads within this domain.This query arises due to the lack of instances involving hot spring biofilms, and this is one of the reasons we adopted the current hybrid sequencing and assembly approach in building MAGs.
In general, our findings suggest that a hybrid assembly of short-and long reads is superior to assemblies based solely on short reads (Table 1).We explored hybrid assembly using MEGAHIT, a program that is less computationally intensive but less commonly employed for Illumina-Nanopore hybrid assembly.Conversely, HybridSPAdes, known for its wide application in hybrid assembly, is more computationally demanding.Our study suggests HybridSPAdes generally outperforms MEGAHIT.Notably, MEGAHIT hybrid-assembled MAGs, chosen by dRep, have a high count and certain MEGAHIT-assembled MAGs are much better in overall quality (Table 2).Thus, researchers planning hybrid assemblies should consider multiple assemblers rather than relying on a single approach.
Our data demonstrate that the amalgamation of short and long reads has substantially improved the overall quality metrics compared to Illumina-MAGs or Nanopore-MAGs alone.Despite this improvement, there are a few drawbacks, namely the high cumulative sequencing cost and lengthy computation time.Additionally, new Nanopore users may face challenges due to the unpredictability of total output reads resulting from poor sample pipetting into the flow cell.Researchers working with hot spring biofilms may want to consider HiFi long reads, especially on the PacBio system, as it may overcome the limitations stated above.However, readers should be aware that the quantity and quality of the extracted genome from biofilm may pose a challenge that needs to be overcome.

Microbial assemblages in green and brown biofilms
The green biofilm, abundant in Cyanobacteria, exhibits limited diversity.Thermophilic cyanobacterium Thermosynechococcus (Cyanophyceae) was prevalent in various regions like Singapore and Taiwan hot springs [43,44].Surprisingly, Thermosynechococcus was notably absent in the SKY hot spring based on amplicon and MAG datasets, likely influenced by physicochemical factors in the water.Among the five Cyanobacteria MAGs within the green biofilm, four were likely filamentous, attaching to plant litter using EPS and non-flagellar appendages, aiding surface adherence.Chloroflexota, many of which are also likely filamentous [33], contribute to the green biofilm matrix too.Cryo-electron tomography of thermophilic Roseiflexus castenholzii and C. aggregans (both Chloroflexota) revealed long pili anchored near septa in multicellular filaments, akin to Cyanobacteria [45].
Cyanobacteria typically have type IV pilin (PilA), while Chloroflexota like C. aggregans employ distinct Tad or Flp pili [45].The exact mechanism of Tad pilus-mediated adherence is not fully understood, but it is believed that these pili facilitate the attachment of bacteria to specific molecules on surfaces.Green biofilm exhibits a notably loose structure and lacks the ability to form a compact matrix when manipulated with forceps.It did not exhibit the expected cohesion of a microbial mat and was instead found to lack a slimy texture usually associated with extracellular polymeric substances.The loosely structured nature of the green biofilm nonetheless provided a favourable environment for the colonisation of various types of microbiota.In a reciprocal relationship, the Cyanobacteria in the green biofilm may compensate for their lack of enzymes capable of decomposing plant litter.During periods of abundant sunlight and lower water temperatures, Cyanobacteria can generate their own food through photosynthesis.However, under conditions of limited sunlight and elevated water temperatures, a different strategy emerges.The loose biofilm structure seems to facilitate a symbiotic relationship, allowing Cyanobacteria to benefit from short sugars produced by extracellular hydrolases secreted by neighbouring microbiota.Nevertheless, this potential symbiosis requires further experimentation.
Distinct from the green biofilm, the brown biofilm exhibited a different microbiome profile.Based on the previous 16S rRNA bacterial V3-V4 amplicon data, approximately 50% of the total ASVs in the brown biofilm were related to the phylum Chloroflexota, while the remaining ASVs were primarily associated with Thermotoga, Bacteroidota, Acidobacteriota, and Armatimonadota.We initially expected that the diversity of Chloroflexota in green and brown biofilms would be somewhat similar.It is possible, however, that only four MAGs are shared between the two biofilm types, namely those related to the genera Thermoflexus, Roseiflexus, NAK19 (order Anaerolineales), and DRWP01 (order Thermoflexales).According to the data from the constructed MAGs, the brown biofilm is rich in the genera Thermoflexus and Thermomicrobium, as well as several novel taxa belonging to the phylum Chloroflexota, whereas the green biofilm is deficient in these genera (Fig. 4).In general, the Chloroflexota phylum can produce energy through both photoheterotrophy and chemoheterotrophy.Certain known members of Chloroflexota have been proposed to be thiotrophs, also known as chemolithotrophs that derive their energy by oxidising inorganic compounds, particularly those rich in sulfur, such as sulfur compounds, sulfides, or elemental sulfur [47].With recorded temperatures reaching up to 77 °C at the sampling site and an average of 74 °C for the brown biofilm itself, the majority of brown biofilm microbiota shall employ various energy acquisition strategies, which include chemoheterotrophic, chemoorganotrophic, chemoautotrophic, or other mechanisms.
The presence of Armatimonadota in green and brown biofilm raises questions, given its limited type strains studies, and its potential for biofilm formation remains uncertain.However, a few of the detected MAGs exhibit genes associated with exopolysaccharide biosynthesis proteins, polysaccharide modifying proteins, and sequences related to pilus formation.Further investigation is warranted to elucidate the biofilm-forming potential of Armatimonadota-related cultured strain when they become available in the future.Based on limited understanding, Armatimonadota perform chemolithotrophy, oxidising inorganics such as hydrogen, sulfur, or iron, sometimes coupled it with autotrophic acetogenesis [11,46].Researchers posit that this phylum is capable of functioning as chemoorganoheterotrophs and contributes to biogeochemical cycling [8,11].
The phylum Armatimonadetes, formerly known as candidate division OP10 (where OP refers to Obsidian Pool at the Yellowstone National Park), is speculated to be a plant biomass degrader.Currently, the Armatimonadetes phylum holds only a few cultured taxa, including mesophilic strains (Armatimonas rosea, Capsulimonas corticalis, and Fimbriimonas ginsengisoli), and the thermophilic Chthonomonas calidirosea.In a genome analysis of C. calidirosea, the sole culturable thermophilic Armatimonadetes found so far, researchers noted 65 GH enzymes within the bacterium [62].However, that study found that pure culture of C. calidirosea was unable to hydrolyse linear polysaccharides such as cotton, Avicel, lignocellulosic pulp preparations, and specifically cellulose.However, the MAGs constructed from our study indicated the presence of cellulases (Table 3).
The scientific community should not only focus on the Armatimonadetes phylum but also turn attention to Chloroflexota.MAGs associated with Chloroflexota demonstrate a diverse range of amylolytic enzymes and cellulose-degrading enzymes, highlighting the importance of exploring the enzymatic potential within this phylum for various biotechnological applications.Upon closer analysis, it becomes evident that the protein sequences of the CAZymes in Armatimonadetes and Chloroflexota exhibit low similarity with other proteins, warranting further indepth investigations of these candidates in future studies.We have analysed some CAZyme sequences from Armatimonadetes and Chloroflexota, predicted their overall structures using AlphaFold, and examined the protein domain architectures.Our findings suggest that these protein sequences are authentic CAZymes, albeit their activities require validation in subsequent work.

Conclusion
In this study, we conducted a comprehensive analysis of microbial community composition, taxonomy, adaptation, and metabolic potential within the green and brown biofilms of the SKY hot spring.The green biofilm was primarily composed of Cyanobacteria, alongside other phyla like Bacteroidota, Chloroflexota, and Acidobacteriota.Unlike typical firm and slimy biofilms, the green biofilm exhibited a loose structure, promoting diverse microbiota colonisation, ecological diversity, and potential symbiotic interactions within the matrix.In contrast, the brown biofilm, thriving in higher temperatures, showcased a more diverse composition, encompassing archaea as well as a variety of bacterial phyla.Heat shock proteins were prevalent in both biofilm types, underscoring their pivotal role in maintaining protein stability and countering heat stress-induced protein denaturation and proteotoxicity.Both biofilms exhibited various CAZymes, signifying the cooperative efforts of diverse microbial taxa in converting plant lignocellulosic biomass.Noteworthy were Armatimonadota and Chloroflexota MAGs that showcased versatility in carbohydrate metabolism, possessing an array of hydrolases targeting diverse carbohydrates.Current commercial enzyme cocktails for lignocellulosic saccharification are typically sourced from a single type of mesophilic fungal strains, and these cocktails contain almost the complete range of necessary enzymes.However, our study has revealed that in natural geothermally heated environments where plant litter decomposition occurs, finding a single prokaryotic thermophile with all the required industrial enzymes may not be feasible because thermophiles and their enzymes often collaborate synergistically.

Fig. 2
Fig. 2 Maximum likelihood phylogenetic tree of MAGs (green biofilms, GB) based on the alignment of 25 marker genes in GToTree and visualised by iTOL

Fig. 4
Fig. 4 Heat map displaying ANI comparison between Chloroflexota MAGs in green-and brown biofilm

Table 1
Overall statistics and quality of MAGs assembled in each biofilm type using Illumina-reads, Nanopore-reads, and Illumina + Nanopore hybrid reads this work have slightly lower completeness and higher contaminants compared to the counterparts in pure Illumina-or hybrid-MAGs.Despite this, hybridising short and long reads has improved the overall quality metrics; hence, most of dRep-selected MAGs were from hybrid techniques.In other words, Nanopore sequencing is still essential in this work, because it further enhance the overall quality of MAGs.

Table 2
Summary of dereplicated MAGs

Table 3
Selected putative cellulase, endoglucanase, xylanase, β-glucosidase and β-xylosidase from the selected MAGs and the closest hits.Refer to Additional file 2: TableS2for the protein sequences